# Action recognition
Xclip Base Patch16 Hmdb 4 Shot
MIT
X-CLIP is a minimalist extension of CLIP for general video-language understanding, trained via contrastive learning with (video, text) pairs.
Video-to-Text
Transformers English

X
microsoft
22
1
Finetuned ViT Indian Food Classification V3
Apache-2.0
This model is a fine-tuned image classification model based on google/vit-base-patch16-224-in21k on the Human_Action_Recognition dataset, achieving an accuracy of 93.84%.
Image Classification
Transformers

F
DrishtiSharma
60
2
Featured Recommended AI Models